Skip to content

perf: avoid replay transition concatenation#649

Merged
TATP-233 merged 1 commit into
mainfrom
fix/appo-collector-param-and-replay-add
Jun 29, 2026
Merged

perf: avoid replay transition concatenation#649
TATP-233 merged 1 commit into
mainfrom
fix/appo-collector-param-and-replay-add

Conversation

@TATP-233

Copy link
Copy Markdown
Collaborator

Summary

  • write replay transitions directly into packed shared-memory columns instead of building a full concatenated transition tensor first
  • preserve wraparound and terminal next-observation patching behavior for actor and critic storage
  • remove the unused APPO num_workers constructor parameter; APPO uses a single collector path and the training kwargs no longer expose collector-count knobs

Validation

  • make test-all completed successfully on macOS local dev environment
  • targeted microbenchmark for replay add hot path: direct column writes 0.250 ms/batch vs old cat-row path 0.379 ms/batch for 4096 rows, obs=98, action=12, critic=101 (~1.52x replay-add speedup)

Required Follow-up Validation

  • Please run on an Ubuntu CUDA training host and compare end-to-end collector throughput / training iteration timing before and after this change.
  • Specifically verify that the replay-add improvement translates to a measurable collector-side or iteration-level gain under a real CUDA learner workload.

@TATP-233 TATP-233 requested a review from caozx1110 as a code owner June 29, 2026 14:35
@TATP-233 TATP-233 merged commit c7af13d into main Jun 29, 2026
6 checks passed
@TATP-233 TATP-233 deleted the fix/appo-collector-param-and-replay-add branch June 29, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant